Add `containers/tei/{cpu,gpu}/1.6.0` #132

alvarobartt · 2024-12-12T15:40:46Z

Description

This PR adds a new container for TEI v1.6.0 just released (see the release notes at https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.6.0).

The main feature on TEI v1.6.0 w.r.t. TEI v1.5.0 is that it now supports multiple CPU backends, not just ONNX, meaning that it can also serve embedding models on CPU with backends other than ONNX (since not every model on the Hub comes with an ONNX-converted version of the weights). Some other features include the addition of the General Text Embeddings (GTE) heads, the implementation of MPNet, fixes around the health checks, and much more.

Note

Note that this PR also includes the changes from the https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.5.1 release.

To inspect the changes required to make the TEI container work in GCP, see the diff at:

TEI on GPU: ca1a6c3
TEI on CPU: 938c9d9

philschmid

LGTM!

philschmid · 2024-12-23T18:13:58Z

How does this CPU multibackend work? Does it check if there are *.onnx weights and if so use them if not use normal pytorch + candle?

alvarobartt · 2024-12-26T11:05:12Z

How does this CPU multibackend work? Does it check if there are *.onnx weights and if so use them if not use normal pytorch + candle?

Yes, it tries to download the ONNX weights before, otherwise it rolls back to using safetensors, attaching here the code where the backend is initialized as a reference 👍🏻

https://github.com/huggingface/text-embeddings-inference/blob/57d8fc8128ab94fcf06b4463ba0d83a4ca25f89b/backends/src/lib.rs#L199-L295

alvarobartt · 2025-01-03T09:34:55Z

Also @philschmid, see the logs below as reference on how do those look when running on CPU with a model from the Hub without the ONNX converted weights e.g. ibm-granite/granite-embedding-125m-english.

One minor nit within the logs is that it claims to have downloaded the onnx/model.onnx file as of the following messages (also seen in the screenshot above):

2025-01-03T09:31:22.895547Z  INFO text_embeddings_backend: backends/src/lib.rs:401: Downloading `model.onnx`
2025-01-03T09:31:22.916581Z  WARN text_embeddings_backend: backends/src/lib.rs:405: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/ibm-granite/granite-embedding-125m-english/resolve/main/model.onnx)
2025-01-03T09:31:22.916595Z  INFO text_embeddings_backend: backends/src/lib.rs:406: Downloading `onnx/model.onnx`
2025-01-03T09:31:22.936811Z  INFO text_embeddings_backend: backends/src/lib.rs:218: Model ONNX weights downloaded in 41.26313ms

But that's not true and can be missleading since it tries to initialize the ONNX backend when the file's not there cc @OlivierDehaene for reference (happy to open this or contribute to it within the TEI repository if needed!)

alvarobartt added 3 commits December 12, 2024 16:17

Add containers/tei/{cpu,gpu}/1.6.0 baseline

b88bf16

Update TEI Dockerfile for CPU and add entrypoint.sh

938c9d9

Update TEI Dockerfile and entrypoint.sh for GPU

ca1a6c3

alvarobartt added container tei labels Dec 12, 2024

alvarobartt requested a review from philschmid December 12, 2024 15:40

alvarobartt self-assigned this Dec 12, 2024

Update tei/{cpu,gpu}/1.6.0/Dockerfile

9b5a854

philschmid approved these changes Dec 23, 2024

View reviewed changes

Fix FromAsCasing warning

f950777

Merge branch 'main' into upgrade-text-embeddings-inference

23d068c

alvarobartt merged commit 1c31c51 into main Jan 3, 2025
1 check passed

alvarobartt deleted the upgrade-text-embeddings-inference branch January 3, 2025 09:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `containers/tei/{cpu,gpu}/1.6.0` #132

Add `containers/tei/{cpu,gpu}/1.6.0` #132

alvarobartt commented Dec 12, 2024

philschmid left a comment

philschmid commented Dec 23, 2024

alvarobartt commented Dec 26, 2024

alvarobartt commented Jan 3, 2025 •

edited

Loading

Add containers/tei/{cpu,gpu}/1.6.0 #132

Add containers/tei/{cpu,gpu}/1.6.0 #132

Conversation

alvarobartt commented Dec 12, 2024

Description

philschmid left a comment

Choose a reason for hiding this comment

philschmid commented Dec 23, 2024

alvarobartt commented Dec 26, 2024

alvarobartt commented Jan 3, 2025 • edited Loading

Add `containers/tei/{cpu,gpu}/1.6.0` #132

Add `containers/tei/{cpu,gpu}/1.6.0` #132

alvarobartt commented Jan 3, 2025 •

edited

Loading